Econ 715 


Lecture 6 

Testing Nonlinear Restrictions 1 

The previous lectures prepare us for the tests of nonlinear restrictions of the form: 

H 0 : h(9o) = 0 versus Hi : h(9 0 ) 0. (1) 

In this lecture, we consider Wald, Lagrange multiplier (LM), and quasi-likelihood ratio (QLR) test. 
The Delta Method will be useful in constructing those tests, especially the Wald test. 

1 The Delta Method 

The delta method can be used to find the asymptotic distribution of h(9 n ), suitably normalized, if 
d n {0 n ~ So) ->d Z. 

Theorem (A-method): Suppose d n (9 n — 6 0 ) — Y where 9 n and Y are random k-veetors, 9 0 is a 
non-random k- vector, and {d n : n > 1} is a sequence of scalar constants that diverges to infinity 
as n —> 00 . Suppose h(-) : R k — > R e is differentiable at 9 q, i.e., h{9) = h(9o) + r h(9o)(9 — 9o) + 

o(\\9 — floll) as 9 —> 9 q. Then, 

r\ 

d n (h(9 n ) - h(9 0 )) ^— 7 h(9 0 )Y. 

08 

If F ~ JV(0, V), then ^k(9 0 )Y ~ N (0, ^rh(9 0 )V(£rh(9 0 ))'). 

Proof: By the assumption of differentiability of h at 9q, we have 

d n (h{9 n ) - h(9o)) = ^ 7 h(9 0 )dn(9 n - 9 0 ) + d„o(||?„ - 0 O ||)- 
o9 

The first term on the right-hand side converges in distribution to J^rh{9o)Y. So, we have the desired 
result provided d n o(\\9 n — 0o||) = o p (l). This holds because 

d n o(\\9 n - 6*011) = || d n {9 n - 0 o )||o(l) = O p (l)o(l) = o p (l) 

and the proof is complete. □ 


1 The notes for this lecture is largely adapted from the notes of Donald Andrews on the same topic I am grateful 
for Professor Andrews’ generosity and elegant exposition. All errors are mine. 
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2 Tests for Nonlinear Restrictions 

The i? r -valued function h(-) defining the restrictions and the matrix Qq (defined in Assumption 
CF) are assumed to satisfy: 

Assumption R: (i) h(9) is continuously differentiable on a neighborhood of 0 o- 

(ii) H = -^rh(do) has full rank r (< d, where d is the dimension of 8 ). 

(iii) flo positive definite. 

For example, we might have 


h{ 8 ) = 8 X - 0 2 , 
h(9) = 9id 2 - 03 , or 

h{ 8 ) = ( 9l ) . (2) 

\e 2 + 9 3 -i) 

The requirement that Oo is positive definite ensures that the asymptotic covariance matrix, 
Bq 1 Q 0 Bq 1 , of y/n(9 n — 9 o) is positive definite. 

The Wald statistic for testing H 0 is defined to be 

W n = nh(fi n )\H n B~ 1 h n B~ 1 H' n )~ 1 h(9 n ) 1 where 

Hn = ^7^(0n) (3) 

and B n and fl n are consistent estimators of Bo and Qq respectively. See Lecture 5 for the choice of 
B n and 

As defined, the Wald statistic is a quadratic form in the (unrestricted) estimator h{9 n ) of the 
restrictions h(9o). The quadratic form has a positive definite (pd) weight matrix. If Hq is true, 
then h{9 n ) should be close to zero and the quadratic form W n in h(9 n ) also should be close to zero. 
On the other hand, if H 0 is false, h{9 n ) should be close to h(9 0 ) ^ 0 and the quadratic form W n 
in h(9 n ) should be different from zero. Thus, the Wald test rejects Hq when its value is sufficiently 
large. Large sample critical values are provided by Theorem 6.1 below. 

The LM and QLR statistics depend on a restricted extremum estimator of 0 0 . The restricted 
extremum estimator, 0 „, is an estimator that satisfies 

9 n e 0, h{9 n ) = 0, and Q n {9 n ) < inf{Q„(0) : 0 G 0, h(9) = 0} + o p (n _1/2 ). (4) 
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The LM statistic is defined by 

CM n = n^Q n 0 n )B~ 1 H! ri (H n B~ 1 a n B~ 1 H , n y 1 H n B~ 1 -^-Q n 0 n ), where 
oO 06 

Hn = (5) 

and B n and Cl n are consistent estimators under H 0 of B 0 and fig, respectively, that are constructed 
using the restricted estimator 9 n rather than 9 n . 

The LM statistic is a quadratic form in the vector of derivatives of the criterion function eval¬ 
uated at the restricted estimator 0 n . The quadratic form has a pel weight matrix. If the null 
hypothesis is true, then the unrestricted estimator should be close to satisfying the restrictions. 
In this case, the restricted and unrestricted estimators, 9 n and 9 n , should be close. In turn, this 
implies that the derivative of the criterion function at 9 n and 9 n should be close. The latter, 
-§gQn{9 n )i equals zero by the first-order conditions for unrestricted minimization of Q n (9) over 0. 
(More precisely, ^gQ n (9 n ) is only close to zero, viz., o p (?r -1 / 2 ), by Assumption EE2, since 9 n is not 
required to exactly minimize Q n {9) just approximately minimize it.) Then, ■§gQ n {9 n ) should be 
close to zero under H 0 and the quadratic form, CM n , in -§gQ n {9 n ) also should be close to zero. On 
the other hand, if H 0 is false, then 9 n and 9 n should not be close, should not be close to 

zero, and the quadratic form, CXi n , in ^gQ n (9 n ) also should not be close to zero. In consequence, 
the LM test rejects Hq when the LM n statistic is sufficiently large. Asymptotic critical values for 
CA4 n are determined by Theorem 6.1 below. 

Note that CA4 n can be computed without computing 9 n . This can have computational advan¬ 
tages if the criterion function is simpler when the restrictions h{9) are imposed than when unre¬ 
stricted. For example, if the unrestricted model is a nonlinear regression model and the restricted 
model is a linear model, then the LS estimator has a closed-form solution under the restrictions, 
but not otherwise. 

Next, we consider the QLR statistic. It has the proper asymptotic null distribution only when 
the following assumption holds: 

Assumption QLR: (i) fi 0 = cB 0 for some scalar constant c/ 0. 

(ii) c n — > p c for some sequence of non-zero random variables {c n : n > 1 }. 

For example, in the ML example with correctly specified model, the information matrix equality 
yields flo — Bo, so Assumption QLR holds with c = c n = 1. In the LS example with conditionally 
homoskedastic errors (i.e., E(Ui\Xi) = 0 a.s. and E(U 2 \Xi) = o 2 a.s.), Assumption QLR holds with 
c = <r 2 and 

c n = ^Eti(y i -g(xM) 2 . ( 6 ) 
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In the GMM, MD, and TS examples with optimal weight matrix (e.g., A'A = U 0 _1 for the GMM and 
MD estimators, see Lecture 5), Assumption QLR holds with c = c = 1, since fl 0 = B 0 = r^V^IV 
The QLR statistic is defined by 

QCK n = 2 n{Q n (e n ) - Qn(0n))/c n . (7) 

When H 0 is true, the restricted and unrestricted estimators, 9 n and 9 n , should be close and, 
hence, so should the value of the criterion function evaluated at these parameter values. Thus, 
under H 0 , the statistic QClZ n should be close to zero. Under Hi, QClZ n should be noticeably 
larger than zero, since the minimization of Q n (9 ) over the restricted parameter space, which does 
not include the minimum value 9 o of Q(9), should leave Qn(9 n ) noticeably larger than Q n (9 n ). In 
consequence, the QLR test rejects H 0 when QCTZ„ is sufficiently large. Asymptotic critical values 
are provided by Theorem 6.1 below. 

For the ML example, the QLR statistic equals the standard likelihood ratio statistic, viz., minus 
two times the logarithm of the likelihood ratio. 

When Assumption QLR holds, the CM n statistic simplifies to 

Q „ ~ , 

CMl n = Tl . 01 ' 

ad 08 

r\ r\ 

CMn = n-^Qnidn)^- 1 —Qn(9 n ) (8) 

(since in this case one can take f l n = c n B n and -§gQ n {9 n ) = —H' n \ n for some vector A„ of Lagrange 
multipliers). 

For the LM and QLR tests, we use the following assumption. 

Assumption REE: 6 n 9 q under Hq- 

This assumption can be verified using the results of Lecture 3 with the parameter space 0 replaced 
by the restricted parameter space 0 = {9 e 0 : h(6) = 0}. 

For the Wald and LM tests, we assume that consistent estimators under Hq of Bq and flo are 
used in constructing the test statistics: 

Assumption COV: For the Wald statistic, B n ^ B 0 and Cl n Q 0 under H 0 . For the LM 
statistic, B n B 0 and Cl n U 0 under H 0 . 

Estimators that satisfy this assumption are discussed in Lecture 5. 
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Theorem 6.1: (a) Under Assumptions EE2, CF, R, and COV, 

-i xl under H 0 , 

where xf. denotes a chi-square distribution with r degrees of freedom. 

(b) Under Assumptions EE2, CF, R, COV, and REE, 

CM n Xr under Hq. 

(c) Under Assumptions EE2, CF, R, COV, REE, and QLR, 

QOZ n -i Xr under H 0 . 

One rejects H 0 using the Wald test if W n > A£, where k r a is the 1 — a quantile of the Xr 
distribution. The LM and QLR tests are defined analogously. These tests have significance level 
approximately equal to a for large n. 

Under sequences of local alternatives to H 0 : h(&) = 0, the Wald, LM, and QLR test statistics 
have noncentral chi-squared distributions with the same noncentrality parameter, see next Lecture 
Thus, the three tests, Wald, LM, and QLR, have the same large sample power functions. One cannot 
choose between these tests based on (first order) large sample power. The choice can be made on 
computational grounds. It also can be made based on the closeness of the true finite sample size of 
a test to its nominal asymptotic size a. The best test according to the latter criterion depends on 
the particular testing context. The folklore (backed up by various simulation studies), however, is 
that the Wald test over-rejects in many contexts and the LM and QLR tests often are preferable 
in consequence. 

Proof of Theorem 6.1: First we establish part (a). Element-by-element mean value expansions 
of h{9 n ) about 9 o give 

\/nh{Q n ) = y/nh(6 0 ) + -^h{9* n )\fn{Q n - 9 0 ) 

= Hy/n(6 n -9 0 )+o p (l) ( 9 ) 

^ Z o ~N(0, HBo 1 S2 0 Bo 1 H'), 

where 9* lies between 9 n and 9q (and, hence, satisfies 0* — 9 o by Assumption EE2(i)), the second 
equality holds because h(9o) = 0 under H 0 and ■j^rh{9* n ) —> p -ggrh(9o) = H using Assumption R, 
and the convergence in distribution uses Theorem 5.1. 
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By Assumptions COV, EE2, and R, 


(HnB-'QnB-'Hn)- 1 4 (HB^ 1 n 0 B^ H')~ l ■ 


( 10 ) 


Combining (9) and (10) gives 

W„ = V^h(d n Y(H n B- 1 n n B- 1 H n )^^h(0 n ) 4 Z^HB^ftoB^H'^Zo, (11) 

where Z = {HB^CIqBq 1 H’)- 1 / 2 Z 0 ~ N(0J r ) and Z'Z ~ \r- 
I 

about 0o yield 


Next, we establish part (b). Element-by-element mean value expansions of -igQ n (9 n ) and h(9 n ) 


r\ 2 

Vn-§gQn(0n) = v4j|<3n(0o) + WXXX7 Q n( 0 * n ) ^/n(0 n - 0 O ) and 


d 


8909' 


0 = v / n/i(0„) = ^h(9 0 ) +-^h(0+)y/n(9 n - 9 0 ) 

06 , 

= mrh^n)Vn{0 n -0 O ), 


( 12 ) 


where 0* —> p 0 O and 0+ —> p 0 O . Let R„(0) = g^grQ n {9) and H(9) = -§grh{9). Then, pre¬ 
multiplication of the first equation of (12) by iL(0^)13„(0*) _1 gives 


H{9+)B n {9* n )- 1 ^n^-Q n {9 n ) 

B - 

= H(9+)B n (0* n )^,/n—Q n (0 o) + H(9+)Vn(9 n - 0 O ) 

d § . 

= ^(^)Sn(0:)- 1 V / ^ M Qn(^o) 

4 Z 0 ~ N(0, HBq 1 Q 0 Bq 1 H 1 ) 

using the second equation of (12) and Assumptions CF(iii), CF(iv), REE, and R. 

Combining (13), the fact that ^n^gQ n (9 n ) = O p ( 1) by (17) below, H(0+) —> p H , iJ, 
B n {9* n ) —> p An and —> p R 0 gives 

U n B~ x ^— Q n (9 n ) 4 Z 0 ~ N(0, HBq 1 VLqBq 1 H'). 

By Assumptions COV, REE, and R, 

{H n B~ 1 Q. n B~ 1 H ' n )~ 1 4 (HB^oB^H')- 1 - 

Combining (14) and (15) gives the desired result 


(13) 


n ^p H 5 


(14) 


(15) 


CMn 4 Z' 0 (HBq 1 Q 0 Bq 1 H')- 1 Z 0 = Z'Z ~ 


(16) 
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under Ho, where Z is as above. 
We now show that 


Vn-^Qn(On) = O p { 1 ). 


(17) 


With probability that goes to one as n —* oo, 9 n is in the interior of 0 and there exists a random 
vector X n of Lagrange multipliers such that 


d_ 

dO 


Qn(9 n ) + H(9 n )' X n = O p (n 1 ^ 2 ). 


(18) 


Equations (13) and (18) combine to give 


H{0t)B n {0* n )- l H(e n )'y/nX n = O p ( 1). 


(19) 


Since H(9~y)B n (9* n )~ 1 H(9 n )' — HB^ 1 H' and HB^H' is nonsingular, (19) implies \fnX n = 
O p { 1). This and (18) imply (17). 

Next, we establish part (c). A two-term Taylor expansion of Q n (9 n ) about 9 n gives 

QCK n = 2 n(Q n {9 n ) - Q n (9n))/c, 
d_ 
l d0 


o o2 

= 2 n—Q n (9 n )(9 n -9n)/c n + n{9 n -9 n )' mm , 4(C)(^-^)/cb (20) 


d 2 


d9d9' 


= o p { 1) + n(6»„ - 9 n )' dede , Q n {9* n )(9 n - 9 n )/c n , 


where 0* lies between and 9 n (and, hence, satisfies 9* n —> p 9 o) and the third inequality holds 
because n 1 / 2 J^Q n (9 n ) = o p ( 1) by EE2(ii) and n 1 / 2 (9 n — 9 n ) = O p ( 1) by (22) below. 
Element-by-element mean-value expansions of (d / d9)Q n (9 n ) about 9 n gives 


^Qn(e n ) = Vri-^Q n 0n) +-^g^Qn{0i)Vn{9„-9 n ) 
= Op(l) + (B 0 + o p (l))y/n(9 n — 9 n ), 


where 9+ —> p 9 0 . Since B 0 is nonsingular, (21) implies 


( 21 ) 


Vn(9 n - 9n) = B 0 \Zn—Q n (9 n ) + Op( 1). 


Substitution of (22) into (20) gives 


A A 

Q£K n = o p (l) + n— Q„(0„) , B o - 1 B n (^)5 o - 1 — Q n (9 n )/c n 

h ~ ~ rt 

= o p (l) + n—Q n (6 n y B- 1 —Q n (9 n )/c n 
= o p (l) + £A4 n , 


( 22 ) 


(23) 
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where the second equality holds by (17) and B n 1 B n {9* n )B 0 1 = B 0 1 +o p (l) and the third inequality 
holds by the simplified expression for CM n given in (8). Part (c) now holds by (23) and part (b). □ 

3 One-Sided Hypotheses 

Sometimes, one-sided hypotheses are of interest: 


H 0 : h ( d 0 ) < 0 vs. H x : h (0 O ) > 0. 


(24) 


We only consider one-dimensional hypotheses, i.e., those with real-valued h (•) .Testing multiple 
dimensional inequality constraints is much trickier, as explained in Wolak (1991). 

We use the same assumptions as above. We consider Wald Statistic only. Deriving the as¬ 
ymptotic properties of LM and QLR statistics requires deriving those for inequality-constraint 
extremum estimators, which is not covered by the previous lectures. The Wald statistic for testing 
Hq is defined to be 



WrF** 


+ 


H n B-^ n Bn l H[ 


-, where [a] + = max{0,a}. 


(25) 


As defined, the Wald statistic is a quadratic form in the positive part of the (unrestricted) estimator 
h(9 n ) of the restrictions h{9 o). The quadratic form has a positive definite (pd) weight matrix. If 


Hq is true, then h{6 n ) should be close to zero and the quadratic form W n in h(9 n ) 


also 


J + 


should be close to zero. On the other hand, if H 0 is false, h{9 n ) should be close to h{9o) > 0 and 


the quadratic form W n in 


h(6 n ) 


should be different from zero. Thus, the Wald test rejects H 0 


L -I + 

when its value is sufficiently large. Large sample critical values are provided by Theorem 6.2 below. 

Under the one-sided null hypothesis, there are two cases: h (9 o) < 0 and h (9o) = 0. The 
asymptotic distributions of the Wald statistic are different in the two cases. Theorem 6.2 summarize 
the result. Define the mixed chi-squared distribution 


X+ = [-W(0,1)]+ . 


(26) 


Let "wpal" denote "with probability approaching one". 

Theorem 6.2: (a) Under Assumptions EE2, CF, R, and COV, 

f = 0 wpal if h (0 O ) < 0 
"t —X+ if h (0o) = 0 

Proof of Theorem 6.2: When h (0q) < 0, h{9 n ) < 0 wpal because h{9 n ) —> p h (0o) by Assump- 
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tions R(i) and EE2(i). Therefore, W n = 0 wpal. When h(9 o) = 0, the proof is the same as that 
for Theorem 6.1(a) except for (11). The one-sided counterpart of (11).is: 


Wn = 


y/nh(6 n ) 


J + 


[Zo 


H n B n 1 Q n B n l H n 


HBq L VL 0 Bq 1 H' 


HB^QoBo 1 


n2 


H'Zn 


=d x+- 


(27) 
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